Design of efficient binary multiplier architecture using hybrid compressor with FPGA implementation

In signal processing applications, the multipliers are essential component of arithmetic functional units in many applications, like digital signal processors, image/video processing, Machine Learning, Cryptography and Arithmetic & Logical units (ALU). In recent years, Profuse multipliers are there. In that, Vedic multiplier is one of the high-performance multiplications and it is used to signal/image processing applications. In order to ameliorate the performance of this multiplier further, by proposed a novel multiplier using hybrid compressor. The proposed hybrid compressor-based multiplier is designed and implemented in Field programmable Gate Array (FPGA—spartan 6). The synthesis result shows that the speed of proposed hybrid compressor-based multiplier gets improved as compared to Array multiplier (35.83%), Wallace tree multiplier (34.58%), Vedic Multiplier based on Carry look ahead adder (CLA) (28.49%), Vedic Multiplier based on Ripple carry adder (RCA) (20.65%), Booth Multiplication (21.65%) and Vedic Multiplication based on Han-Carlson Adder (HCA) (20.10%) and Hybrid multiplier using Carry Select Adder (CSELA) (17.81%) and Hybrid Vedic Multiplier (7.15%).

delay due to rippling of long carry propagations are the great issue in adders.Hence, there is a need in designing of high speed and low complexity architectures of adder 5 .
The ripple carry addition has less power utilization and occupies minimum area as compared to most of the adder architectures.But it will linearly increase the delay in bit size of the adder makes and it will be unsuitable for a high-speed application 6 .
In carry save adder, the 3-bits are added simultaneously and carry are stored in present stage and not propagated through the subsequent stages.The speed of this adder is improved due to carry generation.The pros of this adder are adding three input values at a time.However, cons of this adder are occupied large area due to more number of transistor and consume large power.
There are few high-speed adders available in literature such as carry skip adder, the carry look ahead adder (CLA), conditional adder, carry select adder (CSELA) and their combinations.
Carry Look Ahead Adder (CLA) is one of the fast adders.Because, based on generation and propagation principle the sum and carry generation can be done at same time.Here the output carry depends on only input carry irrespective of the bit size.The pros of this adder are lees delay as compared to Ripple Carry adder (RCA).However, cons of this adder are occupied large area due to separate circuit for sum and carry generation.Also, the complexity of the circuit increases with increase bit size 7 .
Another one of the fastest adder is Carry select adder.This adder can perform the addition operation based on the pre assumption of input carry (Assume input carry Cin = 1 or 0).This adder can be improved the speed as compared to Ripple Carry adder (RCA) and Carry Look Ahead Adder (CLA), but it is consumed more area due to dual Ripple Carry adders (RCAs).
To reduce the size of Carry Select adder (CSELA) the binary to excess one conversion (BEC) is introduced.The Modified Carry Select adder (CSELA) designed by the second Ripple Carry adder (RCA) (for Cin = 1) is replaced by Binary to Excess one Converter (BEC).This Binary to Excess one Converter (BEC) occupied the less numbers of Gate (transistor) as compared to the Ripple Carry adder (RCA).Hence the area is reduced 7 .
It is further improved the delay of the Carry Select adder (CSELA) by introducing the concept of parallel prefix adder.It is named as Brunt Kung Adder (BK adder).The delay of circuit is improved by modifying the first stage by BK adder (Replacing RCA for Cin = 0 with BK adder) and second stage by Binary to Excess one Converter (BEC) (Replace RCA for Cin = 1 with BEC) 8 .
Furthermore, the utilization of area by Common Boolean logic (CBL) was developed.In this logic, the existing resource will be utilized to minimize the number of gates.In Carry Select adder (CSELA), the RCA for Cin = 1 is replaced by CBL 8 .
For optimizing the speed and area of CSELA addition Han-Carlson adder is further introduced.This adder is designed on the features of Kogge stone and Brent Kung adder (BK).Since Koggestone have a less delay and Brent Kung has a less area utilization.The combination of BK adder (For Cin = 0) and Koggestone adder (for Cin = 1) is named as Han Carlson adder 9 .
To ameliorate the speed of operation by introducing the parallel carry computation in addition is called as Weinberger recurrence algorithm.In Han-Carlson adder (HCA), the BK adder (for cin = 0) is changed by the modified BK adder and Koggestone adder (for cin = 1) is changed by the 5 bit Binary to Excess one Converter (BEC) module 9 .

Review of multipliers
In numerous applications, multiplication is one of the prominent components which conquest the overall performance of the Signal processing system.Multiplication is a key arithmetic operation, that has a most important primitive component in the performance of various digital processors, machine learning, internet of things, deep learning, Finite impulse response (FIR) filters, convolution, Fast Fourier Transform (FFT), distributed computing, Arithmetic and Logic Unit (ALU), signal/Image processing, and multimedia applications.It decides the area, delay, and overall performance of parallel implementations 10 .
In signal processing application devices, such as smart phones, Laptop, tablet, and Personal computers are required high performance multiplier with the importance of minimum area utilization is very important one.There are various methods available to implement the multiplication operation.
Multiplier are mainly categorized by two methods such as serial and parallel multiplication.In that, serial multiplication, every bit of multiplier is used for calculating the partial products.While parallel multiplication, the partial products of every bit of multiplier are calculated in parallel.The performance (speed) of the multiplication mainly depends on the generation of partial products.The speed is optimized in parallel implementation of multiplication with penalty of area utilization 11 .
In multiplication operation, the bit by bit AND operation is followed by addition of partial products with the help of half and full adders.Speed of a multiplication mostly depends on the number of partial products generations and accumulation.
The multipliers are categorized based on reduction of partial products.There are Array multiplication, Wallace multiplication, Bypassing multiplier, Booth multiplication, Vedic multiplication and Booth recorded Wallace tree multiplication, Baugh Wooley multiplication, Braun multiplication and etc., 12 .
Approximation gives an alternate technique to optimizing the accuracy of multiplication without compensating another circuit.
In addition to the above multiplication, the truncated multiplication is a highly specialized type of multiplication that only determines the part of the product.Because, Approximate calculation is the best technique for an error tolerant and energy efficient applications, that exhibition of essential tolerate the erroneousness, such as signal processing and multimedia applications.Approximate computing was reduced the accuracy of multiplication, nevertheless it still provides a faster result with less power consumption, this method was used in some part of arithmetic circuits in signal processing applications.
A various arithmetic architecture was designed using exact and deterministic principles.But, many applications, namely multimedia, signal/image processing can allow/permit the errors and generate results which are better for human perceptions.Since exact solutions are enough in these error-tolerant application to allow the computing systems to maintain quality and accuracy of the design.Hence it is necessary to concentrate, analyze and investigate the approximate additions and multiplications.

Review of compressor
Multipliers are omnipresent and essential arithmetic module in Very Large-Scale Integration VLSI architectures, particularly in signal processing applications and general-purpose processors.The speed and power utilization of multipliers are the main parameter to determine the performance of system.The optimum performance of multiplication will be achieved on the minimization of partial products generation during the multiplication process.The efficiency of multiplication can be optimized by minimizing the number of partial products within limit of power consumption.Hence, the compressors are specifically designed as an arithmetic module for optimizing the speed and area of an arithmetic module 13 .
There are various methods extensively applied to minimize the partial products such as recording technique, Booth Recorder, Wallace tree multiplication through carry save adder, Dadda multiplication using modified Wallace tree design and Carry Save Adder.

Compressor
The compressor is an important module and it is mostly used in Very Large-Scale Integration (VLSI) circuits and systems and their applications.It is commonly used as a processing element.
A (m, n) compressor consists of m-bit input with carry inputs 'Cin' and produces n-bit output with carry output Cout.The main advantages of compressors are producing the output without rippling of carry.Because of Cout is not depended on the input carry Cin.Also, horizontal and vertical signal paths of compressors are simple and regular structure than other existing technique 14 .

3:2 compressor
The most common and simple compressor is 3:2 compressor and it is called as full adder.It is defined as single bit adder and it consists of three inputs and two outputs.Also, it is designed with many Very Large-Scale Integration (VLSI) logic circuit design technique.It mainly consists of the three modules.The 1st module is to compute XOR / XNOR operation.The 2nd module is to determine the output 'Sum' .The 3rd module is a to evaluate the carry output signal.Full adder can be constructed with the help of these three modules.
The general equation for 3:2 Compressor is where A, B and C are inputs.Sum and Carry are outputs.
The block diagram and structure of 3:2 compressor is displayed in Fig. 1 14 .This structure consists of 2-Exclusive OR gates in critical path.The output 'sum' is computed from 2nd XOR gate and carry is produced by the multiplexer (MUX) 13 .The output equations for 3:2 compressor is The 3-2 compressor has minimum delay as compared to conventional full adder.In order to improve the speed of compressor, XOR gate is replaced by Multiplexer.In that selection input of multiplexer (MUX) is The relation between inputs and outputs of 4:2 compressor is where A 4:2 compressor is generally designed by a combination of multiplexers and XOR gates.The 4:2 compressor was designed by simply cascading of 2 full adders and it is shown in Fig. 2 13 .It achieves the critical path delay of 3 XOR gate delay 14 .
In order to optimize the performance of the 4:2 compressor by design of multiplexer using full swing pass transistor logic to achieve optimized power utilization.The pros of this design are sum generation not dependent on Cout generation, as compared to 4:2 compressor structure shown in Fig. 2. Also, this design was achieved 18% delay improvement as compared to conventional full adder design.
Further it improves the speed of operation, proposed a modification of compression unit by rearranging the Boolean equation to improve the delay of carry computation.The modification of compression unit was designed by combined a NAND and NOR gate into an XOR gate.
For optimizing the performance of the arithmetic circuits by rearranging the Boolean/logic equation or derive the new logic equation from the truth table.The change of Multiplexer against with XOR logic in right places of the existing technique is implemented through Shannon's expansion technique 15 .
The performances are mainly determined by their speed of arithmetic calculation.Arithmetic computation Adders, Shifters and Multipliers are the crucial module of any Signal Processing applications.In addition, process due to huge carry computation delay and sequential behavior, existing digital system architecture is slow in nature.
Also, the Multiplication operation determines the speed of the most Digital Signal Processing (DSP) applications, hence it required high-speed multiplier for an efficient data path circuit design.In order to improve the speed of a multiplier, it minimizes the number of the partial product, since multiplication leads to series of addition of partial products.

Proposed multiplier
The major constraint of the multipliers are the speed (delay) of operation, Hence it is necessary to focus the critical path delay of an multiplier.Hence, the hybrid Compressor based multiplier is proposed to optimize the delay of multiplier as compared to the existing methods.

Hybrid technique
Using more than one logic structure (styles) to design the Module of a system is known as hybrid technique.
There are two types of logic styles are followed in the hybrid technique.They are (1) Homogeneous styles, it using same type of circuit style in all the stages is called Homogeneous structure.( 2) Heterogeneous styles, it using the different type of circuit style in different stages is called Heterogeneous design.Consider an example of full adder.The full adder contains three modules such as two half adder unit and one OR gate module.By using two different logic styles in half adders and another one logic style is used in OR gate module.This type of structure is called as hybrid full adder.This hybrid technique will be optimizing the speed, size and power utilization of the circuit.The basic architecture of hybrid technique is shown in Fig. 3 16 .

Hybrid compressor
In order to optimize the delay and size of multiplication architecture, it incorporates the above-mentioned Compressors in different modules (stages) of the multiplier architecture.The architecture of 12:6 hybrid compressor is shown in Fig. 4.This compressor consists of 3 stages each with 4:2 compressor.The 4:2 compressor (design-1), 4:2 compressor (design-2) and 4:2 compressor (design-3) are used in stage1, stage2 and stage 3, respectively.Also shown in Figs. 5, 6 and 7 respectively.
The Fig. 5 and 6, 4:2 compressor was achieved considerable reduction in Area delay product and power delay product as compared to existing design 13 .It also is improved the delay of minimum 7% and maximum 12.5% 13 .Similarly, this compressor will be suitable for high performance multiplier and their relevant applications.

Proposed multiplication using hybrid compressor
The block diagram of the proposed hybrid compressor-based multiplier is displayed in Fig. 11.This structure is a 4 * 4 multiplier.The partial products are computed with help of series of AND gates.( 10)       The proposed hybrid compressor-based multiplier consists of three stages.In every stage, various-sized hybrid compressor and half adder are used.Namely, 12:6 compressor (combination of various design styles of 4:2 compressor) is used in the first stage (3 numbers of 4:2 compressor), 5:3 compressor (combination of various design styles of 3: 2 compressor) is used in the 2nd stage (two numbers of 3:2 compressor) and a one 3:2 compressor is used in the third stage (one number of 3:2 compressor).

Performance analysis
The different techniques of multiplier, adder and compressor are deliberated in the "Review of adders", "Proposed multiplier" & "Performance analysis" sections and proposed multiplier is elucidated in "Conclusion" section.Simulation of all multiplier architecture done in XILINX ISE (Integrated Software Environment).The Figs. 12 and 13 are display the input/output wave form and percentage of device utilizations of hybrid compressor-based multiplier respectively.The same is implemented in spartan6 Field programmable Gate Array (FPGA) device.All the multiplication technique are verified their input and output individually.
The synthesized results indicated that the delay, Number of Look Up Tables (LUTs) (Size), power consumption of several multiplier technique and it shown in the Table 1.The percentage of speed improvements in terms of delay for hybrid compressor-based multiplier is shown in Fig. 15 as compared to existing multiplier techniques.
The comparison shows the speed in terms of delay of the hybrid Compressor based multiplier is improved 35.83%, 34.58%, 21.65%, 28.49%, 20.65%, 20.10%, 17.81%, 07.15% as compared to Array Multiplication, Wallace    The comparison about delay improvement (%) of proposed multiplier using hybrid compressor is displayed in the chart as displayed in Fig. 15.Also delay of various multiplication techniques of research article is shown in Table 2.In that table it shows that the significant improvement delay is there in proposed multiplier using hybrid compressor.

Conclusion
In this investigation, Hybrid compressor-based multiplier architecture is experimented with various design styles of compressor.The results clearly indicated that the hybrid compressor-based multiplication has considerable improvement in speed of multiplication with reduced size as compared to existing multiplication technique.The proposed hybrid compressor-based multiplier architecture is successfully synthesized and simulated using Xilinx  software and implemented on Field Programmable Gate Array (FPGA) boards.The synthesized result shows that the delay of proposed hybrid compressor-based multiplier is improved 35.83%, 34.58%, 21.65%, 28.49%, 20.65%, 20.10%, 17.81%, 07.15% as compared to Array Multiplication, Wallace tree multiplier, Booth Multiplier, Vedic Multiplier using Carry Look Ahead Adder (CLA), Vedic Multiplier Ripple Carry Adder (RCA), Vedic Multiplication using Han Carlson Adder (HCA), Hybrid Multiplier using Carry Select Adder (CSELA) and Hybrid Vedic Multiplier respectively.The comparative analysis of implementation results motivates authors to conclude that the proposed Hybrid compressors-based multiplier shall be a desirable choice for the implementation of highperformance signal 20 and image processing 21 and other related applications 22 .

Figure 12 .
Figure 12.Simulation result of proposed hybrid compressor based multiplier.

Figure 13 .
Figure 13.Device utilization for proposed multiplier using hybrid compressor.

Figure 14 .
Figure 14.Analysis of ADP and PDP of various multiplier.

Figure 15 .
Figure 15.Percentage of delay improvements in Hybrid compressor-based multiplier as compared to other multiplier.
available before the input signal arrives.Hence, it minimizes the delay of the circuit due to reduction of switching time in critical path of transistor.This will reduce the significant amount of delay.The modified 3:2 compressor output equation is 4:2 compressor Generally, 4:2 compressor is a combination of pair of full adders.The general structure of 4:2 compressor is shown in Fig.2.It accepts 4 inputs with one carry inputs and compress the two outputs namely 'sum' and 'carry' .It also generates the intermediate carry bit 'Cout' .

Table 1 .
Simulation result of different multiplier with Spartan6 FPGA implementation.*ADP, area delay product; PDP, power delay product; LUTs, look up tables.

Table 2 .
Delay analysis of 8*8 multiplier in existing article. S.